Maximum Entropy Approach for Named Entity Recognition in Bengali and Hindi

نویسندگان

  • Mohammad Hasanuzzaman
  • Asif Ekbal
  • Sivaji Bandyopadhyay
چکیده

This paper reports about the development of a Named Entity Recognition (NER) system in two leading Indian languages, namely Bengali and Hindi using the Maximum Entropy (ME) framework. We have used the annotated corpora, obtained from the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL) and tagged with a fine-grained Named Entity (NE) tagset of twelve tags. An appropriate tag conversion routine has been developed in order to convert these corpora to the forms, tagged with the four NE tags, namely Person name, Location name, Organization name and Miscellaneous name. The system makes use of the different contextual information of the words along with the variety of orthographic word-level features that are helpful in predicting the four NE classes. In this work, we have considered language independent features that are applicable to both the languages as well as the language specific features of Bengali and Hindi. Evaluation results show that the use of linguistic features can improve the performance of the system. Evaluation results of the 10-fold cross validation tests yield the overall average recall, precision, and f-score values of 88.01%, 82.63%, and 85.22%, respectively, for Bengali and 86.4%, 79.23%, and 82.66%, respectively, for Hindi.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Two Stage Language Independent Named Entity Recognition for Indian Languages

This paper describes about the development of a two stage hybrid Named Entity Recognition (NER) system for Indian Languages particularly for Hindi, Oriya, Bengali and Telugu. We have used both statistical Maximum Entropy Model (MaxEnt) and Hidden Markov Model (HMM) in this system. We have used variety of features and contextual information for predicting the various Named Entity (NE) classes. T...

متن کامل

A Hybrid Approach for Named Entity Recognition in Indian Languages

In this paper we describe a hybrid system that applies maximum entropy model (MaxEnt), language specific rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we first build a baseline NER system. Then some language specific rules are added to th...

متن کامل

Feature Subset Selection Using Genetic Algorithm for Named Entity Recognition

In this paper, genetic algorithm (GA) is utilized to search for the appropriate feature combination for constructing a maximum entropy (ME) based classifier for named entity recognition (NER). Features are encoded in the chromosomes. The ME classifier is evaluated for the 3-fold cross validation with the features, encoded in a particular chromosome, and its average F-measure value is used as th...

متن کامل

A Hybrid Named Entity Recognition System for South and South East Asian Languages

In this paper we describe a hybrid system that applies Maximum Entropy model (MaxEnt), language specific rules and gazetteers to the task of Named Entity Recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with Named Entity (NE) annotated corpora and a set of features we first build a baseline NER system. Then some language specific rules are added to th...

متن کامل

Named Entity Recognition in Hindi using Maximum Entropy and Transliteration

(NER) system becomes challenging if proper resources are not available. Gazetteer lists are often used for the development of NER systems. In many resource-poor languages gazetteer lists of proper size are not available, but sometimes relevant lists are available in English. Proper transliteration makes the English lists useful in the NER tasks for such languages. In this paper, we have describ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009